Image processing device, image processing method, and program

ABSTRACT

The specifying step of specifying a camera to be adjusted among multiple cameras based on the history of a similarity for an object searched from images captured by the multiple cameras, the calculation step of calculating similarities for the images captured by the multiple cameras with respect to the object of the search target, and the addition step of adding an adjustment value to a similarity for an image captured by the specified camera among the similarities calculated at the calculation step are included.

BACKGROUND OF THE INVENTION Field of the Invention

The present invention relates to an image processing device, an imageprocessing method, and a program.

Description of the Related Art

In recent years, a human search system has been proposed, in whichmultiple monitoring cameras placed in a monitoring area are connected toeach other via a network to cooperate with each other and, e.g., aspecific human or a stray can be searched from multiple images capturedby these multiple cameras.

In the human search system of this type, the captured image of each ofthe multiple cameras is analyzed to detect a human body, and a featureamount indicating object features is extracted from the face of thedetected human body or a human body image. The extracted feature amountis associated with information such as image capturing time, the camerahaving captured the image, and the image of the human body recognized asthe human, and is registered as registration information on such a humanin the human search system.

When the specific human etc. are searched, the feature amount isextracted from a search target image, and the feature amount of theextracted search target image and the feature amount of the registrationinformation on multiple human registered in the human search system arecollated. In such collation of the feature amounts, a similarityindicating likelihood that the registered human is the same as thespecific human targeted for search is calculated, and the registrationinformation on the human for which a similarity equal to or higher thana predetermined threshold has been calculated is searched. Theregistration information on the multiple human searched as describedabove is lined up in a similarity order or a detection time order, andis listed as search results on, e.g., a display device.

Japanese Patent Laid-Open No. 2009-27393 discloses the technique ofupdating features of a search target object by means of a search targetobject image selected by a user in a video image search systemconfigured to acquire and hold input images from multiple cameras.

Specifically, the video image search system of Japanese Patent Laid-OpenNo. 2009-27393 has a condition specifying portion configured to specifya human feature, time, and a camera from the input images, an imagesearch portion configured to search an image matching the conditionsspecified by the condition specifying portion from a held input imagegroup, and a result display portion configured to display image searchresults. A user selects and inputs, in an interactive mode,appropriateness on whether or not a human image displayed on the resultdisplay portion is the same as a human specified by the conditionspecifying portion. The human feature of the image determined as correctby the user is updated in such a manner that such a human feature isadded to or integrated with held human features.

According to the technique described in Japanese Patent Laid-Open No,2009-27393, the user selects the appropriateness on whether or not theimage displayed on the result display portion and having a highsimilarity is a search target human, and the human feature determined ascorrect is expanded. Thus, even for an image including the same humanbut having a different appearance, the accuracy of human search isimproved.

However, in the human search system configured such that the multiplecameras cooperate with each other, an installation location for eachcamera varies in the monitoring area, and therefore, installationconditions such as the angle of view, illumination conditions, etc. aredifferent among the cameras. Moreover, performance such as a resolutionand a frame rate varies among the multiple cameras in many cases. Forthis reason, even for the captured image of the same human, a featureamount of an object whose image has been captured and shape informationsuch as orientation varies among the multiple cameras.

Specifically, due to various image capturing condition differences suchas environment with insufficient illuminance, low camera performancesuch as a resolution, and an unfavorable camera installation angle,there is a camera tending to have a low similarity in collation betweenthe search target and the registered image. Thus, bias (an individualdifference) in human search result output is caused among the multiplecameras.

When the search results are displayed, search results for which asimilarity equal to or higher than a predetermined threshold has beencalculated are listed as in the technique described in Japanese PatentLaid-Open No. 2009-27393. Thus, the search result of the camera tendingto have a low similarity is sometimes omitted from the list, and ismissed as a monitoring target even when the search result shows the samehuman. Thus, there is a probability that the accuracy of human search islowered.

There is a need in the art to provide an image processing device, animage processing method, and a program for properly performing imagesearch by means of captured images from multiple cameras regardless of adifference in image capturing conditions among the cameras.

SUMMARY OF THE INVENTION

For solving the above-described issues, a search result displayprocessing method in a monitoring system is provided, the methodincluding the specifying step of specifying a camera to be adjustedamong multiple cameras based on the history of a similarity for anobject searched from images captured by the multiple cameras, thecalculation step of calculating similarities for the images captured bythe multiple cameras with respect to the object of a search target, theaddition step of adding an adjustment value to a similarity for an imagecaptured by the specified camera among the similarities calculated atthe calculation step, and the display processing step of performing theprocessing of displaying images in descending order of the similarityamong the image having the similarity to which the adjustment value hasbeen added at the addition step and an image captured by a cameradifferent from the camera specified at the specifying step.

Further features of the present invention will become apparent from thefollowing description of exemplary embodiments (with reference to theattached drawings).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of one example of a network configuration of a humansearch system according to an embodiment of the present disclosure.

FIG. 2 is a diagram of one example of a hardware configuration of anetwork camera according to the present embodiment.

FIG. 3 is a diagram of one example of a functional configuration of eachdevice forming the human search system according to the presentembodiment.

FIG. 4 is a table of one example of human information managed by a humansearch server according to the present embodiment.

FIG. 5 is a flowchart of one example of a human information registrationprocessing procedure according to the present embodiment.

FIG. 6 is a flowchart of one example of a human search processingprocedure according to the present embodiment.

FIG. 7 is a flowchart of one example of a detailed procedure ofadjustment processing of FIG. 6.

FIG. 8 is a table of one example of human search results extracted byhuman information extraction processing of FIG. 7.

FIG. 9 is a view of one example of a display screen for displayingprocessing results of human search processing.

FIG. 10 is a view of one example of the display screen afterinter-camera adjustment has been instructed on the display screen ofFIG. 9.

FIG. 11 is a view of one example of the display screen after displayingin a camera order has been instructed on the display screen of FIG. 10.

FIG. 12 is a view of one example of the display screen on which aninstruction for similarity threshold adjustment is input for eachcamera.

FIG. 13 is a view of one example of the display screen after a scrollbar has been operated on the display screen of FIG. 12.

DESCRIPTION OF THE EMBODIMENTS

Hereinafter, an embodiment for implementing the present disclosure willbe described with reference to the attached drawings. Note that theembodiment described below is one example as a technique forimplementing the present disclosure, and needs to be corrected orchanged as necessary according to a device configuration or variousconditions to which the present disclosure is applied. The presentdisclosure is not limited to the embodiment below. Moreover, allcombinations of features described in the present embodiment are notnecessarily essential for solution of the present disclosure.

In the present embodiment, images captured by multiple image capturingdevices are analyzed. When an image search result for a search target isoutput from image analysis processing results, the image capturingdevice having a lower similarity between the search target and aregistered image than those calculated for other image capturing devicesis identified. Then, in the present embodiment, the similaritycalculated for the identified image capturing device can be adjustedsuch that the image captured by such an image capturing device isincluded in the output image search results. Moreover, when a lowsimilarity is calculated, the identified image capturing device and thesearch result for the image captured by such an image capturing devicecan be notified in distinction from other image capturing devices andthe image search results thereof.

With this configuration, bias in the image search results to be outputamong the multiple image capturing devices due to, e.g., a difference inimage capturing conditions is adjusted. Thus, failure in output of thesearch results for the same human as the search target is reduced, andthe accuracy of image search is improved.

Hereinafter, the “similarity” indicates, in the present embodiment,likelihood that a human registered as an image analysis result is thesame as a specific human targeted for search. The similarity can be usedas a threshold for determining whether or not output as an image searchresult is to be made, and in the present embodiment, can be setseparately for each of the multiple image capturing devices.

Moreover, the “image capturing conditions” of the image capturing deviceinclude, but not limited to, installation conditions such as the angleof view, illumination conditions, and image capturing device performancesuch as a resolution and a frame rate in image capturing of the imagecapturing device, and include any conditions influencing analysis andsearch of a captured image and being different among the multiple imagecapturing devices. Note that in the present embodiment, the case ofapplying a network camera as the image capturing device for monitoringwill be described below as an example, but the present embodiment is notlimited to this case. The present embodiment is also applicable to otherimage search purposes. Moreover, in the present embodiment, a case wherea captured image is analyzed to detect a human body and the same humanas a human of a search target is searched from human registrationinformation including a feature amount of the detected human body willbe described below as an example, but the search target to which thepresent embodiment is applicable is not limited to above. The presentembodiment is applicable to any object image search purposes including amoving object and a still object in a captured image.

Network Configuration of Present Embodiment

FIG. 1 is a diagram of one example of a network configuration in thecase of applying a human search system according to the presentembodiment to a network camera system.

A network camera system 10 of FIG. 1 includes at least two networkcameras 20 a, 20 b, an image analysis server 30, a human search server40, a network storage 50, and a search terminal device 60, The networkcameras 20 a, 20 b, the image analysis server 30, the human searchserver 40, the network storage 50, and the search terminal device 60 areconnected to each other via a network 70, thereby exchanging informationwith each other.

The network 70 may be, for example, a wired local area network (LAN)according to communication standards such as Ethernet (registeredtrademark). Alternatively, the network 70 may include a wirelessnetwork. The wireless network includes wireless personal area networks(PANs) such as Bluetooth (registered trademark), ZigBee (registeredtrademark), and Ultra WideBand (UWB). Moreover, the wireless networkincludes a wireless local area network (LAN) such as Wireless Fidelity(Wi-Fi) (registered trademark), and a wireless metropolitan area network(MAN) such as WiMAX (registered trademark). Further, the wirelessnetwork includes a wireless wide area network (WAN) such as LTE/3G. Notethat communication standards, size, and configuration are not limited toabove as long as the network 70 communicably connects each type ofequipment.

The network cameras (hereinafter also simply referred to as “cameras”)20 a, 20 b are image capturing devices, such as monitoring cameras,configured to capture an image of an object at a predetermined angle ofview. These cameras 20 a, 20 b can transmit, via the network 70, thecaptured images (hereinafter also simply referred to as “images”) to theimage analysis server 30, the human search server 40, and the networkstorage 50. Note that in FIG. 1, two cameras 20 a, 20 b are illustrated,but the number of cameras may be two or more and is not limited to theillustrated number.

The image analysis server 30 is configured to read, via the network 70,e.g., data of the captured images recorded in the network storage 50,thereby executing image analysis processing. Specifically, the imageanalysis server 30 detects a human body from the image acquired from thenetwork storage 50, extracts a feature amount of the detected humanbody, generates human information including the feature amount of theextracted human body, and registers the human information in the networkstorage 50. The “human information” is information (object information)on a human recognized from an image, the information including imagecapturing time, an object ID, a camera ID, a feature amount of a humanbody detected from the image, an image (a human image) of the human bodyrecognized as the human, and the attribute of the human. Details of thehuman information will be described later with reference to FIG. 4. Notethat the entirety or part of the image analysis processing executed bythe image analysis server 30 may be r counted on the cameras 20 a, 20 b.

The human search server 40 is configured to execute human searchprocessing when human search is instructed from a user. Specifically,the human search server 40 extracts a feature amount of an input searchtarget, collates the extracted feature amount of the search target andfeature amounts of multiple human registered as the human information tocalculate a similarity, and outputs, as a search result, a human forwhich a similarity equal to or higher than a predetermined threshold hasbeen calculated.

The network storage 50 is a recording device configured to recordinformation such as the captured images delivered from the cameras 20 a,20 b, the human information transmitted from the image analysis server30, the human search results transmitted from the human search server40, and various types of control information. The network storage 50functions as an external non-volatile storage device for the cameras 20a, 20 b, the image analysis server 30, and the human search server 40.The entirety or part of the information recorded in the network storage50 may be recorded in a local storage device for the cameras 20 a, 20 b,the image analysis server 30, and the human search server 40. In thiscase, the network storage 50 may be omitted as necessary.

The search terminal device 60 includes a display device (a display), andhas the display control function of reproducing and displaying theimages delivered from the cameras 20 a, 20 b and the images recorded inthe network storage 50 and displaying, e.g., human search processingresults described later on the display device. Moreover, the searchterminal device 60 includes a user interface and an input unit for humansearch executed by the human search server 40, and transmits a humansearch processing request to the human search server 40 when the userinstructs human search.

Further, the search terminal device 60 has the function of performingparameter setting operation, such as threshold setting, regarding theimage analysis processing executed by the image analysis server 30 andthe human search processing executed by the human search server 40.

Hardware Configuration of Network Camera

FIG. 2 is a diagram of one example of a hardware configuration of thecamera 20 a, 20 b.

The camera 20 a, 20 b of FIG. 2 includes a CPU 21, a ROM 22, a RAM 23,an external memory 24, an image capturing portion 25, an input portion26, a communication I/F 27, and a system bus 28.

The CPU 21 is configured to overall control operation in the camera 20a, 20 b, and is configured to control each component (22 to 27) via thesystem bus 28.

The ROM 22 is a non-volatile memory configured to store, e.g., controlprograms necessary for execution of various types of processing by theCPU 21. Note that these control programs etc. may be stored in theexternal memory 24 or a detachable storage medium (not shown).

The RAM 23 functions as a main memory, a work area, etc. for the CPU 21.That is, the CPU 21 loads the necessary programs etc. from the ROM 22into the RAM 23 upon execution of various types of processing, therebyexecuting the programs etc. to implement various types of functionaloperation.

The external memory 24 is, for example, configured to store varioustypes of data, various types of information, etc. necessary forperforming processing by means of the programs by the CPU 21. Moreover,the external memory 24 is, for example, configured to store varioustypes of data, various types of information, etc. obtained by theprocessing performed by means of the programs etc. by the CPU 21.

The image capturing portion 25 includes, for example, a lens configuredto capture an image of an object and an image capturing element. Thelens is an optical lens configured to form, on the image capturingelement, an image of incident light from an object targeted for imagecapturing, and is configured to focus the incident light on the imagecapturing element. The image capturing element is an element configuredto convert light into an image signal, and may include a complementarymetal oxide semiconductor (CMOS) and a charge coupled device (CCD), forexample.

The input portion 26 includes a power supply button etc., and the userof the camera 20 a, 20 b can provide an instruction to the camera 20 a,20 b via the input portion 26.

The communication I/F 27 is an interface for communication with anexternal device (e.g., the image analysis server 30) connected to thenetwork 70, and is a LAN interface, for example. The system bus 28communicably connects the CPU 21, the ROM 22, the RAM 23, the externalmemory 24, the image capturing portion 25, the input portion 26, and thecommunication I/F 27 to each other.

The function of each portion of the camera 20 a, 20 b illustrated inFIG. 2 is implemented in such a manner that the CPU 21 executes theprogram stored in the ROM 22 or the external memory 24.

Note that the image analysis server 30, the human search server 40, andthe search terminal device 60 may include, with reference to FIG. 2,hardware such as the display device instead of the image capturingportion 25. The display device may include a monitor such as a liquidcrystal display (LCD). Moreover, the image analysis server 30, the humansearch server 40, and the search terminal device 60 include, as theinput portion 26, a keyboard or a pointing device such as a mouse, andthe user can provide an instruction to each of the devices 30, 40, 60.

Functional Configuration of Network Camera System

FIG. 3 is a block diagram of one example of a functional configurationof each device forming the human search system according to the presentembodiment.

Of function modules of each device illustrated in FIG. 3, functionsimplemented by software have programs for providing the function of eachfunction module, the programs being stored in a memory such as the ROM.These functions are implemented in such a manner that the programs areread out to the RAM and executed by the CPU. For functions implementedby hardware, a dedicated circuit may be, by means of a predeterminedcompiler, automatically generated on a FPGA from the program forimplementing the function of each function module, for example. The FPGAstands for a field programmable gate array. As in the FPGA, a gate arraycircuit may be formed, and may be implemented as the hardware.Alternatively, the hardware may be implemented by an applicationspecific integrated circuit (ASIC). Note that the configuration offunction blocks illustrated in FIG. 3 is one example. Multiple functionblocks may form a single function block, or any of the function blocksmay be divided into blocks for performing multiple functions.

The camera 20 a, 20 b includes an image acquiring portion 201, anencoding portion 202, and a communication portion 203. In the camera 20a, 20 b, the image acquiring portion 201 is configured to acquire thecaptured image. The encoding portion 202 is configured to encode theimage acquired by the image acquiring portion 201. The communicationportion 203 is configured to deliver the image encoded by the encodingportion 202 to the network 70. The image delivered to the network 70 istransmitted to the network storage 50, the image analysis server 30, andthe search terminal device 60.

The network storage 50 includes a recording portion 501 and acommunication portion 502. In the network storage 50, the recordingportion 501 is configured to record the image received by thecommunication portion 502 in the storage device. The communicationportion 502 is configured to receive the image from the camera 20 a, 20b via the network 70, thereby supplying the image to the recordingportion 501.

The image analysis server 30 includes a human body detection portion301, a feature amount extraction portion 302, a human informationtransmission portion 303, and a communication portion 304. In the imageanalysis server 30, the human body detection portion 301 is configuredto detect a human body from the image recorded in the recording portion501 of the network storage 50. Note that the human body detectionportion 301 may utilize, for enhancing the accuracy of detection of thehuman body, results of human body tracking, face detection, and facetracking, for example.

The feature amount extraction portion 302 is configured to extract afeature amount of the human body detected by the human body detectionportion 301. The human information transmission portion 303 isconfigured to associate, via the communication portion 304, the featureamount of the human body extracted by the feature amount extractionportion 302 with the image capturing time, the object ID, the camera ID,an image (a human image) of the human body recognized as a human, theattribute of the human, etc., thereby generating the human information.The generated human information is transmitted to the human searchserver 40 by the human information transmission portion 303.

The communication portion 304 is configured to transmit the humaninformation supplied from the human information transmission portion 303to the human search server 40 via the network 70. Note that the humaninformation transmission portion 303 may transmit the generated humaninformation to the network storage 50, and may record such informationin the recording portion 501.

The human search server 40 includes a human information managementportion 401, a search target feature amount extraction portion 402, asearch portion 403, a camera identification portion 404, an adjustmentportion 405, and a communication portion 406. In the human search server40, the human information management portion 401 is configured toregister and manage, in a storage device, the human informationtransmitted from the human information transmission portion 303 of theimage analysis server 30 via the network 70. The search target featureamount extraction portion 402 is configured to receive, via thecommunication portion 406, a request for searching a human targeted forsearching from the search terminal device 60 and detect a human bodyfrom an image specified by the received human search request, therebyextracting a feature amount of the detected human body as the featureamount of the human targeted for searching.

The search portion 403 is configured to search the human informationmanaged and registered by the human information management portion 401.Specifically, the search portion 403 collates the feature amount, whichis extracted by the search target feature amount extraction portion 402,of the human targeted for searching and the feature amount of the humaninformation managed and registered by the human information managementportion 401, thereby calculating, as the search result, the similaritybetween both of the feature amounts.

The camera identification portion 404 is configured to tabulate thesimilarity calculated by the search portion 403 for each of the cameras20 a, 20 b, thereby identifying one or more cameras 20 a, 20 b for whicha relatively-lower similarity than those of other cameras has beentabulated.

The adjustment portion 405 is configured to calculate an adjustmentvalue for adjusting the thresholds of the similarities for the cameras20 a, 20 b identified by the camera identification portion 404, therebyexecuting adjustment processing in output of the search results amongthe cameras. Details of inter-camera adjustment processing executed bythe adjustment portion 405 will be described later with reference toFIG. 7.

The communication portion 406 is configured to receive the humaninformation transmitted from the human information transmission portion303 of the image analysis server 30 via the network 70, therebysupplying the received human information to the human informationmanagement portion 401. Moreover, the communication portion 406 isconfigured to receive the request for searching the human targeted forsearching from the search terminal device 60, thereby supplying such arequest to the search target feature amount extraction portion 402.

The search terminal device 60 includes a display portion 601, a searchtarget selection portion 602, and a communication portion 603. In thesearch terminal device 60, the display portion 601 is configured toreceive, via the communication portion 603, the images delivered fromthe cameras 20 a, 20 b, the images transmitted from the network storage50, the human search results transmitted from the human search server40, etc., and thereby displaying these images on the display device.

The search terminal device 60 further includes a user interface forspecifying a human as the search target necessary when a searchinstruction is sent to the human search server 40.

Note that it has been described above that image processing devices suchas the cameras 20 a, 20 b, the image analysis server 30, and the humansearch server 40 forming the human search system process images.However, in these image processing devices, processing contents are thesame even when a video image is acquired and processed for each frame.Thus, these devices are also applicable as video image processingdevices to the human search system.

A human image as the search target is input to the search terminaldevice 60. Specifically, the human image as the search target can bespecified in such a manner that human images recorded in the networkstorage 50 are displayed by the display portion 601 and an imageselected from the displayed human images by the user is used.Alternatively, an image held in advance by the user may be used. Thesearch terminal device 60 may transmit the image selected via the userinterface by the user to the image analysis server 30 via thecommunication portion 603, and may cause the image analysis server 30 toanalyze the image held in advance.

FIG. 4 illustrates one example of a layout of the human informationmanaged by the human information management portion 401 of the humansearch server 40. As illustrated in FIG. 4, the human informationincludes image capturing time 41 at which an image of a human body as adetection target has been captured, an object ID 42 for identifying anobject in the image, a camera ID 43 for identifying one of the multiplecameras 20 a, 20 b, and a feature amount 44 extracted from the detectedhuman body. Further, the human information includes a thumbnail 45 andattribute information 46. The thumbnail 45 is a thumbnail image of ahuman to be displayed on the display device. The thumbnail 45 may beheld as part of the human information by the human informationmanagement portion 401. Alternatively, only the position of the human inthe image may be stored in the human information, and the humaninformation management portion 401 may acquire, when the thumbnail needsto be displayed, a corresponding image from the recording portion 501 ofthe network storage 50 and generate the thumbnail by cutting out thehuman position from the acquired image.

The attribute information 46 includes, for example, the age (the agegroup), sex, and appearance feature of the human recognizable from thehuman image.

Human Information Registration Processing Flow of Present Embodiment

FIG. 5 is a flowchart of a registration processing procedure for thehuman information illustrated in FIG. 4, the processing being executedby the image analysis server 30.

The processing illustrated in FIG. 5 may begin, for example, when acommunication function of the image analysis server 30 is activated andthe image analysis server 30 is brought into a state communicable withother communication devices via the network. Note that the timing ofstarting the processing illustrated in FIG. 5 is not limited to above.

The image analysis server 30 can execute the processing illustrated inFIG. 5 in such a manner that the CPU 21 reads out the necessary programfrom the ROM 22 or the external memory 24 and executes such a program.Note that the processing of FIG. 5 may be implemented in such a mannerthat at least some of the elements illustrated in FIG. 5 operate asdedicated hardware. In this case, the dedicated hardware operates basedon the control of the CPU.

At S51, the communication portion 304 of the image analysis server 30receives image data transmitted from the camera 20 a, 20 h or thenetwork storage 50. The received image data is expanded and decoded inthe image analysis server 30, and is acquired as an image (a movingimage or a still image) targeted for human body detection. The imageacquired at S51 is sequentially transmitted to the human body detectionportion 301.

Note that an image supply source for the image analysis server 30 is notspecifically limited, and may be a server device or a recorded videoimage management device configured to supply an image via or not via awire or other image capturing devices than the cameras 20 a, 20 b.Alternatively, the image analysis server 30 may acquire an image asnecessary from the memory (e.g., the external memory 24) in the imageanalysis server 30. Hereinafter, a case where a single image isprocessed by the image analysis server 30 will be described regardlessof the case of acquiring a moving image or a still image by the imageanalysis server 30 at S51. In the former case, the single image isequivalent to each frame forming the moving image. In the latter case,the single image is equivalent to the still image.

At S52, the human body detection portion 301 of the image analysisserver 30 uses, e.g., a collation pattern dictionary prepared in advanceto execute human body detection processing for the image acquired atS51. Note that the human body detection portion 301 may have thefunction of detecting a region of the entire human body from the image,and the human body detection processing to be executed is not limited topattern processing.

Other human body detection methods may include a method described in USPatent Publication No. 2007/0237387, for example. Specifically,according to this method, a detection window with a predetermined sizeis scanned on an input image, and two-class discrimination fordetermining whether or not a pattern image as a cutout image in thedetection window is a human body is performed for the pattern image. Insuch discrimination, many weak discriminators are effectively combinedusing AdaBoost (adaptive boosting) to form a discriminator. Thisimproves discrimination accuracy. Moreover, these discriminators areconnected in series to form a cascade detector.

The weak discriminator has a histograms-of-oriented-gradients (HOG)feature amount. On the other hand, the cascade detector first removepattern candidates which are obviously not objects by means of thesimple discriminator at a preceding stage, and then, performs, only forother candidates, discrimination for determining whether or not theimage is the human body by means of the complicated discriminator havinga higher discrimination capability at a subsequent stage.

By application of the above-described method, the human body region canbe detected from the moving image (a video image).

The region in the image as a target for execution of the human bodydetection processing by the human body detection portion 301 is notnecessarily the entirety of the image transmitted from the camera 20 a,20 b or the network storage 50. For example, the human body detectionprocessing may be executed only for a human body detection processingregion set by a parameter of a predetermined value in advance.Alternatively, the maximum and minimum sizes of the human body as thedetection target may be specified by parameter setting, and the humanbody detection processing may not be executed for a region outside sucha range. As described above, part of the human body detection processingor the region is omitted so that the human body detection processing canbe speeded up.

Parameter setting as described above can be implemented by processingparameter setting for the human body detection portion 301, and such aprocessing parameter can be set via the image analysis server 30 or theuser interface of the search terminal device 60, for example.

Moreover, the method for acquiring the entire body region of the objectby the human body detection portion 301 is not necessarily theabove-described method for initially acquiring the entire body region.For example, the human body detection portion 301 may first estimate theentire body region from a position obtained using, e.g., head detection,upper body detection, or face detection, thereby acquiring entire bodyregion information.

For example, in face detection processing, e.g., edges of the eyes andthe mouth are detected from the image, and in this manner, acharacteristic portion of the face of the human body is detected. Thatis, in face detection processing, a face region is detected from a faceposition, a face size, face likelihood, etc.

For example, the longitudinal length of an upper body region detected inupper body detection may be simply extended downward of a screen by apredetermined number of times, and in this manner, the entire bodyregion may be estimated and acquired. The predetermined number of timesmay be a fixed value, or may be variably set according to, e.g., acamera installation condition.

Note that in a crowded place image in which many human bodies arepresent in a captured image, there are overlaps of the human bodies inmany cases. In this case, head detection, upper body detection, etc.resulting in less hidden portions and less detection failure even undera crowded situation are preferred.

At S53 of FIG. 5, in the human body detection processing of S52, it isdetermined whether or not the human body has been detected from theacquired image. When the human body has been detected at S52 (S53: Y),the processing proceeds to S54. On the other hand, when the human bodycannot be detected at S52 (S53: N), the processing proceeds to S56.

In a case where the human body can be detected from the acquired image,the feature amount extraction portion 302 of the image analysis server30 extracts, at S54, a feature amount of the human body from the humanbody image detected at S52. The feature amount to be extracted mayinclude, for example, the positions of organ characteristic points suchas the eyes, the nose, the cheeks, the mouth, and the eyebrows formingthe face, the luminance of the vicinity of the organ characteristicpoints, a positional relationship among the organ characteristic points,the average color, average luminance, most frequent luminance, andtexture of clothes, a body shape, and a gait.

After the feature amount has been extracted for all human bodiesdetected from the image at S54, the human information transmissionportion 303 of the image analysis server 30 generates, at S55, the humaninformation from the human body feature amount extracted at S54, therebytransmitting the generated human information to the human search server40 via the communication portion 304. The human information generated atS55 and transmitted to the human search server 40 includes, asillustrated in FIG. 4, information regarding the human, such as thehuman body feature amount, the human image (the thumbnail), and thehuman attribute information, and accompanying information such as the IDof the camera having captured the image. The human search server 40receives the human information transmitted from the image analysisserver 30, and the received human information is registered and managedby the human information management portion 401 of the human searchserver 40.

At S56, the image analysis server 30 determines whether or not humaninformation registration processing of FIG. 5 is to be continued. Forexample, it may be determined whether or not the processing is to becontinued according to whether or not a processing end instruction hasbeen received from the user. When the image analysis server 30determines that the processing is to be terminated (S56: Y), the presentprocessing ends: On the other hand, when the image analysis server 30determines that the processing is to be continued (S57: N), theprocessing returns to S51, and is continued. Each type of processing inthe human information registration information of FIG. 5 ends throughthe above-described processing.

Human Search Processing Flow of Present Embodiment

FIG. 6 is a flowchart of a human search processing procedure executed bythe human search server 40. The processing illustrated in FIG. 6 maybegin, for example, when a communication function of the human searchserver 40 is activated and the human search request is received from thesearch terminal device 60. Note that the timing of starting theprocessing illustrated in FIG. 6 is not limited to above.

At S61, a human image targeted for searching is selected. Specifically,the human images recorded in the network storage 50 are displayed on thedisplay device of the search terminal device 60, and the user selectsthe human image targeted for searching from the human images displayedon a user interface of the search target selection portion 602. Thehuman search server 40 receives the image selected from the searchterminal device 60, and the processing proceeds to S62.

At S62, the search target feature amount extraction portion 402 of thehuman search server 40 first executes the human body detectionprocessing for the image acquired at S61.

At S63, in a case where a human body has been detected from the image(S63: Y), the processing proceeds to S64, and the search target featureamount extraction portion 402 extracts a feature amount of the humanbody as the search target detected at S62. On the other hand, in a casewhere no human body is detected from the image at S63 (S63: N), theprocessing proceeds to S67.

An example where the human body detection processing and feature amountextraction processing are executed by the human search server 40 hasbeen described above, but the present embodiment is not limited toabove. For example, it may be configured such that the human bodydetection function and the feature amount extraction function of theimage analysis server 30 can be utilized from the search terminal device60 or the human search server 40, and at S62 and S64, these functions ofthe image analysis server 30 may be called from the human search server40. It is enough to call a function of an optional device configured sothat the human body detection processing or the feature amountextraction processing can be performed, and such processing may beexecuted by other devices than those having the above-describedfunctions in the image analysis server 30.

Alternatively, when the image selected at the search terminal device 60at S61 is an image for which the human body has been already detectedand the feature amount has been already extracted by the image analysisserver 30, the human information management portion 401 may acquire thefeature amount of the human targeted for searching from the humaninformation on such a registered human. In this case, S62 to S64 are notnecessary, and therefore, can be omitted.

At S65, the search portion 403 executes the human search processing bymeans of the feature amount of the human as the search target extractedor acquired at S64.

Specifically, the search portion 403 collates the feature amount of thehuman as the search target extracted at S64 and the feature amountalready registered in the human information to calculate the similaritybetween these amounts, and returns, as the human search result, thehuman information including the registered human for which thecalculated similarity is equal to or higher than the predeterminedthreshold.

At S66, the adjustment portion 405 adjusts a threshold for the output(the display) of the human search result obtained at S65.

The multiple cameras 20 a, 20 b placed in a monitoring area aredifferent from each other in installation conditions such as the angleof view, illumination conditions, and image capturing conditions such ascamera performance, and therefore, the camera tending to have a lowersimilarity than those of other cameras is present.

For this reason, when an attempt is, at a stage at which the humansearch result is obtained at S65, made to directly list the human searchresults from the images captured by the multiple cameras 20 a, 20 b, theresult of the camera 20 a, 20 b tending to have a relatively-lowersimilarity is omitted from the list. That is, even when the imagecaptured by the camera 20 a, 20 b having a relatively-low similarityincludes the same human as the human targeted for searching, such ahuman is omitted from the list, and therefore, is missed. On the otherhand, in the present embodiment, a similarity difference calculatedamong the cameras is, at S66, adjusted before output of the human searchresult, and therefore, biased frequency of the display of the searchresult for the human targeted for searching is reduced. Details ofdetection result adjustment processing will be described later withreference to FIG. 7.

At S67, the search portion 403 of the human search server 40 lines up,with reference to the similarity threshold obtained at S66 and adjustedamong the cameras, the search results according to, e.g., a similarityorder or an image capturing time order, thereby displaying the list.Moreover, the search portion 403 performs the control of displaying,with reference to the set predetermined similarity threshold, only thesearch results for which a similarity equal to or higher than thethreshold has been calculated. This similarity threshold may bebasically set in advance for the system, but the user can change thesimilarity threshold to an optional value.

Alternatively, the similarity threshold may be displayed with referenceto a threshold set for each camera after execution of thelater-described adjustment processing, or may be changed for each cameraon the display screen.

Each type of processing of the flowchart of FIG. 7 ends through theabove-described processing.

Details of Inter-Camera Adjustment Processing

FIG. 7 is a flowchart of a search result adjustment processing procedureexecuted among the cameras by the human search server 40.

At S661, the adjustment portion 405 of the human search server 40extracts, for each of the cameras 20 a, 20 b, the human information on aspecific target for similarity adjustment from the human informationmanaged by the human information management portion 401 and illustratedin FIG. 4. For example, the human information having a relatively-highsimilarity to the feature amount of the human as the search targetselected at S61 can be, as the specific target, extracted from the humaninformation managed by the human information management portion 401.

FIG. 8 illustrates one example of a table obtained as a result ofextraction of the human information executed at S661. The table of FIG.8 includes a group of an object ID 81, a camera ID 82, and thesimilarity between the specific target set at S661 and the object, thesimilarity being calculated for the camera. The similarity illustratedin FIG. 8 is a value of 0 to 1000, and indicates a higher similarity asthe value increases. For each of the multiple cameras 20 a, 20 b, theprocessing of S661 is executed.

At S662, the adjustment portion 405 calculates the average of thesimilarities for each camera by means of extraction results obtained atS661. This average can be calculated using values within the top 10 ofthe extraction results of FIG. 8, for example. Alternatively, the numberof extraction results used for calculation of the average is not withinthe top 10 of the similarities, but only the top similarity may be used,for example. Alternatively, targets for calculation of the average mayvary according to the number of specific target extraction results, suchas use of the top 10 percent of the number of specific target extractionresults for each camera. As described above, the number of extractionresults used for calculation of the similarity may be determined suchthat the number of extractions from the camera or the rate of extractionfrom the camera is equal among the cameras. Alternatively, all of theextraction results obtained at S661 may be used to calculate the averageof the similarities for each camera.

At S663, the camera identification portion 404 of the human searchserver 40 identifies one or more of the multiple cameras having arelatively smaller average of the similarities than those of othercameras. Specifically, the camera identification portion 404 determineswhether or not the average of the similarities calculated for eachcamera at S662 is lower than the average of the similarities for allcameras by a value equal to or greater than a predetermined value,thereby specifying the camera having a low similarity. The average ofthe similarities for all cameras can be calculated by execution of theprocessing of S662 for all cameras. Alternatively, at S663, the averageof the similarities for a single camera and the average of thesimilarities for all other cameras may be compared to each other.

At S664, the adjustment portion 405 determines a similarity adjustmentvalue (an additional value) to be added to the output value of thesimilarity for the camera identified at S663. The similarity adjustmentvalue determined at S664 can be, for example, determined to such a valuethat a difference between the output value of the similarity for thecamera specified at S663 including the similarity adjustment value andthe average of the similarities for all cameras is within a range.Correction for adding the determined similarity adjustment value to theaverage of the similarities for the camera identified at S663 is made sothat non-displaying of the search results for such a camera can bereduced.

Alternatively, in the present embodiment, the similarity threshold as athreshold for determining whether or not the human search result is tobe displayed at S67 can be separately set for the cameras 20 a, 20 b.The search portion 403 may compare the similarity threshold setseparately for the cameras as described above to the similarity obtainedas a result of human search, thereby performing the display control ofdisplaying, on the display device, only the human search resultsexceeding the similarity threshold. In this case, the similaritythreshold is, in displaying, lowered for the camera 20 a, 20 b having alow similarity, and therefore, non-displaying of the search results forsuch a camera can be reduced. The decrement of the similarity thresholdin this case may be, for example, set to such an extent that thedifference between the average of the similarity output values for thecamera identified at S663 and the similarity average for all cameras iswithin a predetermined range.

Note that an example where calculation of the similarity average andcalculation of the adjustment value are executed with a single humanbeing specified as the specific target has been described above, butmultiple human may be specified as the specific targets at S661.

In a case where them multiple human are the specific targets asdescribed above, the adjustment portion 405 may calculate, at S662, thesimilarity average for each human specified as the specific target, andcalculate the similarity average for such a camera from the similarityaverage for the multiple human. Moreover, at S663, the similarityaverage for all cameras and the similarity average for each camera maybe compared to each other, and the camera identification portion 404 mayexecute the processing as in the case of a single human.

Note that at S661, the multiple human different from each other inattribute are preferably specified as the specific targets. This isbecause it is assumed that a human attribute difficult to be detectedaccording to the image capturing conditions of the installed camera isdifferent among the multiple cameras. Specifically, bias such as adifficulty in obtaining a feature amount of a human wearing dark colorclothes by a certain camera and a difficulty in obtaining a featureamount of a short human due to other cameras attached at high positionsmight be different according to attribute. In this case, the attribute(a face, an entire body, the color of clothes, etc.) difficult to beacquired by the camera is determined for each camera, and may benotified to the user.

When the multiple human different from each other in attribute arespecified as the specific targets as described above, non-displaying ofthe search result by the camera, which is difficult to search the human,in the list of the search results for the multiple cameras can bereduced.

Each type of processing of the flowchart of FIG. 7 ends through theabove-described processing.

An example where the similarity average for each camera is utilized atS663 has been described above, but the present embodiment is not limitedto above. For example, a maximum value at which the similarity for eachcamera is the maximum may be utilized instead of the average, or thecamera to be adjusted or the adjustment value therefor may be determinedbased on whether or not the average deviation or standard deviation ofthe similarity for each camera is equal to or greater than apredetermined value.

Moreover, an example where the adjustment value is, at S664, added tothe similarity for the camera identified at S663 and the similaritywhich the adjustment value has been added is displayed has beendescribed above, but the present embodiment is not limited to above. Forexample, the camera identified at S663 may be, without automaticallyadding the adjustment value, displayed as the camera from which it isdifficult to obtain the search result on the display device of thesearch terminal device 60, and in this manner, such a camera may benotified to the user.

Further, the degree of difficulty in obtaining the search result by thecamera may be also displayed on the display device, and the user mayselect, according to such a degree, displaying/non-displaying of datawith the adjusted search result. Alternatively, for alerting the user,even when the camera identified at S663 has a low similarity, thedetection result of such a camera may be preferentially displayed as thesearch result in, e.g., a separate frame in displaying so that such adetection result and other detection results can be distinguished fromeach other.

Moreover, the adjustment processing executed by the adjustment portion405 has been described above as a flow along with the human searchprocessing using, as a trigger, the user's human search request, but thepresent embodiment is not limited to above. For example, the adjustmentprocessing may be executed in advance independently of the human searchprocessing.

In this case, in a state in which the human information is accumulatedto a certain degree before the user issues the human search request, animage of an object optionally selected from the human informationmanaged by the human information management portion 401 may be set asthe specific target of S661, and the adjustment processing may beexecuted multiple times. As described above, the camera tending to havea low similarity can be identified in advance of the human searchprocessing.

Alternatively, only the human information identified as the same humanmay be set as the specific target of S661, and may tabulate thesimilarity for each camera at S662. In this case, it is guaranteed thatthe human as the specific target and the human compared to the specifictarget at S662 are the same human. Thus, the output value of thesimilarity for each of the multiple cameras may be adjusted at S664 suchthat the similarity average targeted for the specific target becomesequal among the cameras, i.e., the similarities for the multiple camerasare within a predetermined range. Thus, a bias of the output values ofthe similarities for all of the cameras having captured the specifictarget and having detected the feature amount can be flattened.

For acquiring, from the human information managed by the humaninformation management portion 401, only the human information onspecific human guaranteed as the same human as described above, multiplecandidate images considered as the same human may be presented on thescreen of the display device in the search terminal device 60, forexample. As described above, the user interface allowing the user toselect the same human from the presented candidate images is provided,and therefore, it is guaranteed that the human set as the specifictargets are the same human.

Further, in the human search system, a mode for sequentially movablycapturing only an image of the same human among the multiple cameras toguarantee the same human may be provided as an inter-camera adjustmentmode, and the human information recorded during the inter-cameraadjustment mode may be set as the specific target. As long as the humaninformation on the same human as the human as the specific target can beextracted from the human information on multiple human managed by thehuman information management portion 401, the method is not limited toabove, and other methods may be used. For example, human images of thesame already-registered human which has passed may be manually selectedby the user. Alternatively, a tracking portion configured to track ahuman among the multiple cameras may be provided to set the humanidentified by the tracking portion as the specific target.

GUI Example of Search Target Input and Human Search Result Output

One example of a graphical user interface (GUI) provided by the searchterminal device 60 in the present embodiment will be described in detailwith reference to FIGS. 9 to 13.

FIG. 9 illustrates an example of displayed search results in humansearch before the inter-camera adjustment processing (FIG. 7) for thesearch results in the present embodiment is executed.

In FIG. 9, a search image setting region 91 configured to set an imageof the search target includes a search image checking region 911 forchecking a human image currently set as the search target, and an imageselection button 912, and a search start button 913.

A search result display region 92 configured to display the searchresults in human search includes a region where multiple human images921 as the search results and detailed information 922 on multiple humancorresponding to the human images can be displayed as a list.

In the search result display region 92, a reliability thresholdadjustment slider 93, an inter-camera adjustment switch 94, a displayorder designation column 95, and a scroll bar 96 are provided. Thereliability threshold adjustment slider 93 is a slider configured tovariably adjust a reliability for filtering human to be displayed in thesearch result display region 92 from the search results in human search.In FIG. 9, the reliability is an index for certainty that each humandisplayed in the search result display region 92 is the same as thehuman set as the search target. The value of the reliability to bedisplayed as the detailed human information 922 of the search resultdisplay region 92 can be associated with the value of the similaritycalculated by the human search server 40, or can use such a similarityvalue.

The inter-camera adjustment switch 94 is a switch configured to manuallyinstruct activation of the inter-camera adjustment processing to theadjustment portion 405 of the human search server 40.

When the user activates human search, an image of the search target isfirst specified in the search image setting region 91. When the imageselection button 912 is pressed, another screen for selecting the imageof the search target can be displayed, and the user specifies anoptional image of the search target on this search target imageselection screen. The image specified herein may be specified from thelist of previous human search result images, or an optional human imagefor which the human information has been registered may be specified,for example.

The user can further input, as search conditions, a search time rangeetc. in the search image setting region. After the necessary searchconditions have been input, the user presses the search start button 913to instruct the human search server 40 to start human search.

In the human search server 40, when a series of human search processingillustrated in FIG. 6 is executed, the search result display region 92displays the list of the human images 921 of the candidates matching thesearch conditions specified by the user and the detailed information 922on the human. The detailed information 922 to be displayed together withthe human images 921 in the search result display region 92 can beacquired from the entirety or part of the human information on thehuman, and may further include the reliability obtained from thesimilarity calculated for the human. In FIG. 9, a reliability order isdisplayed in the display order designation column 95, and indicates thatthe search results are displayed in descending order of the reliabilityin the search result display region 92. The descending and ascendingorders may be switchable.

FIG. 9 illustrates the example where a reliability threshold is set to500, and only the human as the search results calculated as areliability (a similarity) equal to or higher than 500 are displayedfrom the search results in the search result display region 92.

The inter-camera adjustment switch 94 is turned ON so that aninstruction for switching between displaying of the search resultssubjected to inter-camera adjustment and displaying of the non-adjustedsearch results can be provided to the human search server 40.

FIG. 10 illustrates a display example after the inter-camera adjustmentswitch 94 has been turned ON to execute the inter-camera adjustmentprocessing in FIG. 9.

In comparison with FIG. 9, a new human image 923 and correspondingdetailed information 924 are, in FIG. 10, displayed in the search resultdisplay region 92 because the inter-camera adjustment switch 94 has beenturned ON. In a region of the detailed information 924, the reliabilityis displayed as “400+.”

Since a reliability (a similarity) of 400 for the human image 923 fallsbelow a reliability threshold of 500, the human image 923 captured by acamera “Cam3” and the corresponding detailed information 924 are notsupposed to be displayed. However, since the inter-camera adjustmentswitch 94 is operated, the human image 923 captured by the camera “Cam3”and the corresponding detailed information 924 are displayed as a newcandidate human image although the threshold falls below a setreliability threshold of 500. In the detailed information 924, e.g., “+”is added to the reliability so that the user can be visually recognizedthat the displayed human image 923 is newly displayed as a result ofexecution of the inter-camera adjustment processing.

Further, in FIG. 10, an image display frame of the human image 923 and adisplay frame of the detailed information 924 newly displayed as aresult of execution of the inter-camera adjustment processing aresurrounded by, e.g., dotted lines, and therefore, the human image 923and the detailed information 924 can be visually recognized indistinction from other search results. Note that displaying by thedisplay frames surrounded by the dotted lines is one example, and anytypes of displaying such as blink displaying may be employed as long asthe search result can be visually recognized in distinction from othersearch results.

Similarly, the reliability threshold adjustment slider 93 may beoperated so that human images and human information newly displayed as aresult of a manual change in the reliability (similarity) threshold canbe visually recognized in distinction from other search results. Forexample, in a case where the number of human images asinitially-displayed search results is small, the reliability thresholdadjustment slider 93 can be moved to downwardly correct the reliabilitythreshold to display more search results in human search.

FIG. 11 illustrates a display example in a case where a search resultdisplay order has been changed to a camera order in the display orderdesignation column 95.

In the display order designation column 95, the search result displayorder can be selected from any of the reliability order, the cameraorder, and a time order, for example. In the search result displayregion 92 of FIG. 11, the human images 923, 925, 927 as the searchresults and the detailed information 924, 926, 928 including thereliabilities for the human are, for each camera, sorted and displayed.In FIG. 11, in the search result display region 92, “ENTER-CAMERAADJUSTMENT PERFORMED” is noted in a display region for the searchresults of the camera “Cam3” so that the search results of the camera“Cam3” obtained as a result of execution of the inter-camera adjustmentprocessing can be visually recognized.

The example where the inter-camera adjustment processing isautomatically activated and executed has been described with referenceto FIGS. 6 and 7, but the present embodiment is not limited to above.For example, the user can separately adjust the similarity threshold foreach of the multiple cameras 20 a, 20 b. 12 and 13 illustrate oneexample of a user interface for setting the similarity (reliability)threshold for each camera as described above. The user can check, viathe user interface illustrated in FIGS. 12 and 13, in advance toguarantee that a search target image 1211 specified at S611 of FIG. 7and a selected image 1221 as a candidate captured image from each cameraare the same human. Moreover, the similarity threshold can be adjustedseparately for each camera.

An adjustment human image specifying region 121 of FIG. 12 is a regionwhere the human image to be used for adjusting the similarity thresholdamong the cameras is specified as the specific target. An imageselection button 1212 is pressed to select the human image as thespecific target in a manner similar to that in specifying of the searchimage of FIG. 9, and a search execution button 1213 is pressed toinstruct the human search server 40 to execute human search.

The human selected in the adjustment human image specifying region 121is preferably image-captured by all of the multiple cameras 20 a, 20 btargeted for adjustment. Thus, as described above, the inter-cameraadjustment mode may be provided such that only the image of the samehuman is sequentially movably captured across the multiple cameras toguarantee the same human. The human information recorded during theinter-camera adjustment mode may be automatically set as the image asthe specific target.

When human search is executed, the human image for which a highsimilarity indicating the highest similarity to the specific targetimage 1211 in the camera has been calculated is displayed as theselected image 1221 in a similarity threshold setting region 122functioning as a search result display region. The selected image 1221is displayed for each of the multiple cameras 20 a, 20 b. In a casewhere the human image displayed as the selected image 1221 is differentfrom the human specified as the adjustment search target image 1211, theuser presses a change button 1222. By pressing the change button 1222,the human images previously captured by the camera are listed, and animage of the same human as the human specified as the adjustment searchtarget image can be specified as the selected image 1221 for the camerafrom the list of the multiple human images.

In FIG. 12, a similarity 1224 calculated for a selected image 1223 forthe camera “Cam3” is displayed as “400.” A similarity of “400” fallsbelow a set similarity threshold of “500,” and therefore, it shows thatthe human captured by the camera “Cam3” tends to have a lower similaritythan those calculated for human captured by other cameras.

As described above, more search results also including the human imagecaptured by the camera “Cam3” are displayed, and therefore, the user canchange a similarity threshold 1225 for the camera “Cam3” from a defaultof “500” to “300,” for example. Alternatively, an automatic settingbutton 123 may be pressed to execute the inter-camera adjustmentprocessing illustrated in FIG. 7, thereby applying the adjustment valuedetermined for each camera. In this manner, the similarity threshold tobe set for each camera may be automatically adjusted.

As described above, when the similarity threshold 1225 for the camera“Cam3” has been changed from “500” to “300” by 200, human images forwhich a similarity (a reliability) equal to or higher than the changedsimilarity threshold 1225 has been calculated are subsequently newlydisplayed in the search result display region 92. Specifically, asillustrated in FIG. 10, the reliability displayed for the camera “Cam3”is calculated as a value (600) to which an adjustment difference of 200has been added in the human search result display region 92, andtherefore, the display order for the entirety of the search results ischanged. That is, the display order for the search results is changed indescending order of a difference between the threshold set for eachcamera and the similarity after adjustment of such a camera.

In this manner, biased human search results among the cameras can beadjusted.

FIG. 13 is a display example of similarity threshold setting for camerasdifferent from those of FIG. 12 in such a manner that a scroll bar 124is operated to move downward in FIG. 12.

FIG. 13 illustrates that for a selected image 1226 for a camera name“Cam6,” an appropriate search result cannot be obtained and therefore noimage is displayed. In this case, the user can press the change button1222 to select the same human as the human as the specific target foradjustment as in selection screen change operation of FIG. 12.

Alternatively, in a case where the image of the human as the specifictarget has not been captured by the camera “Cam6” at the first place orthe human image as the specific target cannot be searched, the state ofthe selected image 1226 of FIG. 13 is brought. Note that in this case,the operation of selecting the selected image 1226 or changing thesimilarity threshold is not necessarily executed.

As described above, according to the present embodiment, the imagescaptured by the multiple image capturing devices are analyzed. When theimage search result for the search target is output from the imageanalysis processing results, the image capturing device having a lowersimilarity between the search target and the registered image than thosecalculated for other image capturing devices is identified. Then, in thepresent embodiment, the similarity calculated for the identified imagecapturing device can be adjusted such that the image captured by such animage capturing device is included in the output image search results.Moreover, when a low similarity is calculated, the identified imagecapturing device and the search result for the image captured by such animage capturing device can be notified in distinction from other imagecapturing devices and the image search results thereof. With thisconfiguration, bias in the image search results to be output among themultiple image capturing devices due to, e.g., a difference in the imagecapturing conditions is adjusted. Thus, failure in output of the searchresults for the same human as the search target is reduced, and theaccuracy of image search is improved.

Other Embodiments

Note that each of the above-described embodiments can be implemented incombination.

Moreover, the present invention can be implemented by a program forimplementing one or more functions of the above-described embodiments.That is, the present invention can be implemented by the processing ofsupplying the program to a system or a device via a network or a storagemedium and reading out and executing the program by one or moreprocessors in a computer (or a CPU, a MPU, etc.) of the system or thedevice. Moreover, the program may be provided with the program beingrecorded in a computer-readable recording medium.

Moreover, each of the above-described embodiments may be applied to asystem including multiple types of equipment such as a host computer,interface equipment, an image capturing device, and a web application,and may be applied to a device including a single type of equipment.

Further, the present invention is not limited to one configured suchthat the functions of the embodiments are implemented by execution ofthe programs read out by the computer. For example, based on a programinstruction, e.g., an operating system (OS) operating on the computermay perform part or the entirety of actual processing, and the functionsof the above-described embodiments may be implemented by suchprocessing.

According to the present invention, the captured images from themultiple cameras can be, regardless of a difference in the imagecapturing conditions among the cameras, properly used to perform imagesearch.

While the present invention has been described with reference toexemplary embodiments, it is to be understood that the invention is notlimited to the disclosed exemplary embodiments. The scope of thefollowing claims is to be accorded the broadest interpretation so as toencompass all such modifications and equivalent structures andfunctions.

This application claims the benefit of Japanese Patent Application No.2018-047731, filed Mar. 15, 2018, which is hereby incorporated byreference herein in its entirety.

What is claimed is:
 1. A search result display processing method in amonitoring system configured to display, as a search result, an imagesimilar to an object of a search target from images captured by multiplecameras, the method comprising: a specifying step of specifying a camerato be adjusted among the multiple cameras based on a history of asimilarity for the object searched from the images captured by themultiple cameras; a calculation step of calculating similarities for theimages captured by the multiple cameras with respect to the object ofthe search target; an addition step of adding an adjustment value to asimilarity for an image captured by the specified camera among thesimilarities calculated at the calculation step; and a displayprocessing step of performing processing of displaying images indescending order of the similarity among the image having the similarityto which the adjustment value has been added at the addition step and animage captured by a camera different from the camera specified at thespecifying step.
 2. The display processing method according to claim 1,wherein at the specifying step, a camera having a lower similarity thanan average of the similarities for the images captured by the multiplecameras by a value equal to or greater than a predetermined value isspecified as the camera to be adjusted.
 3. The display processing methodaccording to claim 2, wherein the predetermined value is set accordingto user operation.
 4. The display processing method according to claim1, wherein at the specifying step, a camera having a lower similaritythan a maximum value of the similarities for the images captured by themultiple cameras by a value equal to or greater than a predeterminedvalue is specified as the camera to be adjusted.
 5. The displayprocessing method according to claim 4, wherein the predetermined valueis set according to user operation.
 6. The display processing methodaccording to claim 1, wherein at the display processing step, the imagefrom the camera specified as the camera to be adjusted is displayed indistinction from images of other cameras.
 7. A search result processingdevice in a monitoring system configured to output, as a search result,an image similar to an object of a search target from images captured bymultiple cameras, the device comprising: a specifying unit configured tospecify a camera to be adjusted among the multiple cameras based on ahistory of a similarity for the object searched from the images capturedby the multiple cameras; an addition unit configured to add anadjustment value to a similarity for an image captured by the specifiedcamera among similarities calculated by a calculation unit whichcalculates the similarities for the images captured by the multiplecameras with respect to the object of the search target; and aprocessing unit configured to perform processing of outputting images indescending order of the similarity among the image having the similarityto which the adjustment value has been added by the addition unit and animage captured by a camera different from the camera specified by thespecifying unit.
 8. The processing device according to claim 7, whereinthe specifying unit specifies, as the camera to be adjusted, a camerahaving a lower similarity than an average of the similarities for theimages captured by the multiple cameras by a value equal to or greaterthan a predetermined value.
 9. The processing device according to claim8, wherein the predetermined value is set according to user operation.10. The processing device according to claim 7, wherein the specifyingunit specifies, as the camera to be adjusted, a camera having a lowersimilarity than a maximum value of the similarities for the imagescaptured by the multiple cameras by a value equal to or greater than apredetermined value.
 11. The processing device according to claim 10,wherein the predetermined value is set according to user operation. 12.The processing device according to claim 7, wherein the processing unitdisplays the image from the camera specified as the camera to beadjusted in distinction from images of other cameras.
 13. Anon-transitory computer-readable storage medium for storing a programfor executing a search result processing method in a monitoring systemconfigured to output, as a search result, an image similar to an objectof a search target from images captured by multiple cameras, wherein theprocessing method comprises: a specifying step of specifying a camera tobe adjusted among the multiple cameras based on a history of asimilarity for the object searched from the images captured by themultiple cameras; a calculation step of calculating similarities for theimages captured by the multiple cameras with respect to the object ofthe search target; an addition step of adding an adjustment value to asimilarity for an image captured by the specified camera among thesimilarities calculated at the calculation step; and a processing stepof performing processing of outputting images in descending order of thesimilarity among the image having the similarity to which the adjustmentvalue has been added at the addition step and an image captured by acamera different from the camera specified at the specifying step. 14.The non-transitory computer-readable storage medium according to claim13, wherein at the specifying step, a camera having a lower similaritythan an average of the similarities for the images captured by themultiple cameras by a value equal to or greater than a predeterminedvalue is specified as the camera to be adjusted.
 15. The non-transitorycomputer-readable storage medium according to claim 14, wherein thepredetermined value is set according to user operation.
 16. Thenon-transitory computer-readable storage medium according to claim 13,wherein at the specifying step, a camera having a lower similarity thana maximum value of the similarities for the images captured by themultiple cameras by a value equal to or greater than a predeterminedvalue is specified as the camera to be adjusted.
 17. The non-transitorycomputer-readable storage medium according to claim 16, wherein thepredetermined value is set according to user operation.
 18. Thenon-transitory computer-readable storage medium according to claim 13,wherein at the processing step, the image from the camera specified asthe camera to be adjusted is displayed in distinction from images ofother cameras.